Goto

Collaborating Authors

 ethnic group


A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus

Scott, Michael, Liang, Siyu, Wassink, Alicia, Levow, Gina-Anne

arXiv.org Artificial Intelligence

This paper presents a systematic evaluation of racial bias in four major commercial automatic speech recognition (ASR) systems using the Pacific Northwest English (PNWE) corpus. We analyze transcription accuracy across speakers from four ethnic backgrounds (African American, Caucasian American, ChicanX, and Yakama) and examine how sociophonetic variation contributes to differential system performance. We introduce a heuristically-determined Phonetic Error Rate (PER) metric that links recognition errors to specific linguistically motivated variables derived from sociophonetic annotation. Our analysis of eleven sociophonetic features reveals that vowel quality variation, particularly resistance to the low-back merger and pre-nasal merger patterns, is systematically associated with differential error rates across ethnic groups, with the most pronounced effects for African American speakers across all evaluated systems. These findings demonstrate that acoustic modeling of dialectal phonetic variation, rather than lexical or syntactic factors, remains a primary source of bias in commercial ASR systems. The study establishes the PNWE corpus as a valuable resource for bias evaluation in speech technologies and provides actionable guidance for improving ASR performance through targeted representation of sociophonetic diversity in training data.


Dr. Bias: Social Disparities in AI-Powered Medical Guidance

Kondrup, Emma, Imouza, Anne

arXiv.org Artificial Intelligence

With the rapid progress of Large Language Models (LLMs), the general public now has easy and affordable access to applications capable of answering most health-related questions in a personalized manner. These LLMs are increasingly proving to be competitive, and now even surpass professionals in some medical capabilities. They hold particular promise in low-resource settings, considering they provide the possibility of widely accessible, quasi-free healthcare support. However, evaluations that fuel these motivations highly lack insights into the social nature of healthcare, oblivious to health disparities between social groups and to how bias may translate into LLM-generated medical advice and impact users. We provide an exploratory analysis of LLM answers to a series of medical questions spanning key clinical domains, where we simulate these questions being asked by several patient profiles that vary in sex, age range, and ethnicity. By comparing natural language features of the generated responses, we show that, when LLMs are used for medical advice generation, they generate responses that systematically differ between social groups. In particular, Indigenous and intersex patients receive advice that is less readable and more complex. We observe these trends amplify when intersectional groups are considered. Considering the increasing trust individuals place in these models, we argue for higher AI literacy and for the urgent need for investigation and mitigation by AI developers to ensure these systemic differences are diminished and do not translate to unjust patient support. Our code is publicly available on GitHub.


Unifying the Extremes: Developing a Unified Model for Detecting and Predicting Extremist Traits and Radicalization

Lahnala, Allison, Varadarajan, Vasudha, Flek, Lucie, Schwartz, H. Andrew, Boyd, Ryan L.

arXiv.org Artificial Intelligence

The proliferation of ideological movements into extremist factions via social media has become a global concern. While radicalization has been studied extensively within the context of specific ideologies, our ability to accurately characterize extremism in more generalizable terms remains underdeveloped. In this paper, we propose a novel method for extracting and analyzing extremist discourse across a range of online community forums. By focusing on verbal behavioral signatures of extremist traits, we develop a framework for quantifying extremism at both user and community levels. Our research identifies 11 distinct factors, which we term ``The Extremist Eleven,'' as a generalized psychosocial model of extremism. Applying our method to various online communities, we demonstrate an ability to characterize ideologically diverse communities across the 11 extremist traits. We demonstrate the power of this method by analyzing user histories from members of the incel community. We find that our framework accurately predicts which users join the incel community up to 10 months before their actual entry with an AUC of $>0.6$, steadily increasing to AUC ~0.9 three to four months before the event. Further, we find that upon entry into an extremist forum, the users tend to maintain their level of extremism within the community, while still remaining distinguishable from the general online discourse. Our findings contribute to the study of extremism by introducing a more holistic, cross-ideological approach that transcends traditional, trait-specific models.


Equitable Length of Stay Prediction for Patients with Learning Disabilities and Multiple Long-term Conditions Using Machine Learning

Abakasanga, Emeka, Kousovista, Rania, Cosma, Georgina, Akbari, Ashley, Zaccardi, Francesco, Kaur, Navjot, Fitt, Danielle, Jun, Gyuchan Thomas, Kiani, Reza, Gangadharan, Satheesh

arXiv.org Artificial Intelligence

People with learning disabilities have a higher mortality rate and premature deaths compared to the general public, as reported in published research in the UK and other countries. This study analyses hospitalisations of 9,618 patients identified with learning disabilities and long-term conditions for the population of Wales using electronic health record (EHR) data sources from the SAIL Databank. We describe the demographic characteristics, prevalence of long-term conditions, medication history, hospital visits, and lifestyle history for our study cohort, and apply machine learning models to predict the length of hospital stays for this cohort. The random forest (RF) model achieved an Area Under the Curve (AUC) of 0.759 (males) and 0.756 (females), a false negative rate of 0.224 (males) and 0.229 (females), and a balanced accuracy of 0.690 (males) and 0.689 (females). After examining model performance across ethnic groups, two bias mitigation algorithms (threshold optimization and the reductions algorithm using an exponentiated gradient) were applied to minimise performance discrepancies. The threshold optimizer algorithm outperformed the reductions algorithm, achieving lower ranges in false positive rate and balanced accuracy for the male cohort across the ethnic groups. This study demonstrates the potential of applying machine learning models with effective bias mitigation approaches on EHR data sources to enable equitable prediction of hospital stays by addressing data imbalances across groups.


A Data Envelopment Analysis Approach for Assessing Fairness in Resource Allocation: Application to Kidney Exchange Programs

Kaazempur-Mofrad, Ali, Dai, Xiaowu

arXiv.org Artificial Intelligence

Kidney exchange programs have significantly increased transplantation rates but raise pressing questions about fairness in organ allocation. We present a novel framework leveraging Data Envelopment Analysis (DEA) to evaluate multiple fairness criteria--Priority, Access, and Outcome--within a single model, capturing complexities that may be overlooked in single-metric analyses. Using data from the United Network for Organ Sharing, we analyze these criteria individually, measuring Priority fairness through waitlist durations, Access fairness through Kidney Donor Profile Index scores, and Outcome fairness through graft lifespan. We then apply our DEA model to demonstrate significant disparities in kidney allocation efficiency across ethnic groups. To quantify uncertainty, we employ conformal prediction within the DEA framework, yielding group-conditional prediction intervals with finite sample coverage guarantees. Our findings show notable differences in efficiency distributions between ethnic groups. Our study provides a rigorous framework for evaluating fairness in complex resource allocation systems, where resource scarcity and mutual compatibility constraints exist. All code for using the proposed method and reproducing results is available on GitHub.


Unveiling Disparities in Maternity Care: A Topic Modelling Approach to Analysing Maternity Incident Investigation Reports

Cosma, Georgina, Singh, Mohit Kumar, Waterson, Patrick, Jun, Gyuchan Thomas, Back, Jonathan

arXiv.org Artificial Intelligence

This study applies Natural Language Processing techniques, including Latent Dirichlet Allocation, to analyse anonymised maternity incident investigation reports from the Healthcare Safety Investigation Branch. The reports underwent preprocessing, annotation using the Safety Intelligence Research taxonomy, and topic modelling to uncover prevalent topics and detect differences in maternity care across ethnic groups. A combination of offline and online methods was utilised to ensure data protection whilst enabling advanced analysis, with offline processing for sensitive data and online processing for non-sensitive data using the `Claude 3 Opus' language model. Interactive topic analysis and semantic network visualisation were employed to extract and display thematic topics and visualise semantic relationships among keywords. The analysis revealed disparities in care among different ethnic groups, with distinct focus areas for the Black, Asian, and White British ethnic groups. The study demonstrates the effectiveness of topic modelling and NLP techniques in analysing maternity incident investigation reports and highlighting disparities in care. The findings emphasise the crucial role of advanced data analysis in improving maternity care quality and equity.


Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness

Yuan, Yingfang, Chen, Kefan, Rizvi, Mehdi, Baillie, Lynne, Pang, Wei

arXiv.org Machine Learning

The growing interest in fair AI development is evident. The ''Leave No One Behind'' initiative urges us to address multiple and intersecting forms of inequality in accessing services, resources, and opportunities, emphasising the significance of fairness in AI. This is particularly relevant as an increasing number of AI tools are applied to decision-making processes, such as resource allocation and service scheme development, across various sectors such as health, energy, and housing. Therefore, exploring joint inequalities in these sectors is significant and valuable for thoroughly understanding overall inequality and unfairness. This research introduces an innovative approach to quantify cross-sectoral intersecting discrepancies among user-defined groups using latent class analysis. These discrepancies can be used to approximate inequality and provide valuable insights to fairness issues. We validate our approach using both proprietary and public datasets, including EVENS and Census 2021 (England & Wales) datasets, to examine cross-sectoral intersecting discrepancies among different ethnic groups. We also verify the reliability of the quantified discrepancy by conducting a correlation analysis with a government public metric. Our findings reveal significant discrepancies between minority ethnic groups, highlighting the need for targeted interventions in real-world AI applications. Additionally, we demonstrate how the proposed approach can be used to provide insights into the fairness of machine learning.


I-SIRch: AI-Powered Concept Annotation Tool For Equitable Extraction And Analysis Of Safety Insights From Maternity Investigations

Singh, Mohit Kumar, Cosma, Georgina, Waterson, Patrick, Back, Jonathan, Jun, Gyuchan Thomas

arXiv.org Artificial Intelligence

Maternity care is a complex system involving treatments and interactions between patients, providers, and the care environment. To improve patient safety and outcomes, understanding the human factors (e.g. individuals decisions, local facilities) influencing healthcare delivery is crucial. However, most current tools for analysing healthcare data focus only on biomedical concepts (e.g. health conditions, procedures and tests), overlooking the importance of human factors. We developed a new approach called I-SIRch, using artificial intelligence to automatically identify and label human factors concepts in maternity healthcare investigation reports describing adverse maternity incidents produced by England's Healthcare Safety Investigation Branch (HSIB). These incident investigation reports aim to identify opportunities for learning and improving maternal safety across the entire healthcare system. I-SIRch was trained using real data and tested on both real and simulated data to evaluate its performance in identifying human factors concepts. When applied to real reports, the model achieved a high level of accuracy, correctly identifying relevant concepts in 90\% of the sentences from 97 reports. Applying I-SIRch to analyse these reports revealed that certain human factors disproportionately affected mothers from different ethnic groups. Our work demonstrates the potential of using automated tools to identify human factors concepts in maternity incident investigation reports, rather than focusing solely on biomedical concepts. This approach opens up new possibilities for understanding the complex interplay between social, technical, and organisational factors influencing maternal safety and population health outcomes. By taking a more comprehensive view of maternal healthcare delivery, we can develop targeted interventions to address disparities and improve maternal outcomes.


DemOpts: Fairness corrections in COVID-19 case prediction models

Awasthi, Naman, Abrar, Saad, Smolyak, Daniel, Frias-Martinez, Vanessa

arXiv.org Artificial Intelligence

COVID-19 forecasting models have been used to inform decision making around resource allocation and intervention decisions e.g., hospital beds or stay-at-home orders. State of the art deep learning models often use multimodal data such as mobility or socio-demographic data to enhance COVID-19 case prediction models. Nevertheless, related work has revealed under-reporting bias in COVID-19 cases as well as sampling bias in mobility data for certain minority racial and ethnic groups, which could in turn affect the fairness of the COVID-19 predictions along race labels. In this paper, we show that state of the art deep learning models output mean prediction errors that are significantly different across racial and ethnic groups; and which could, in turn, support unfair policy decisions. We also propose a novel de-biasing method, DemOpts, to increase the fairness of deep learning based forecasting models trained on potentially biased datasets. Our results show that DemOpts can achieve better error parity that other state of the art de-biasing approaches, thus effectively reducing the differences in the mean error distributions across more racial and ethnic groups.


Auditing the Fairness of COVID-19 Forecast Hub Case Prediction Models

Abrar, Saad Mohammad, Awasthi, Naman, Smolyak, Daniel, Frias-Martinez, Vanessa

arXiv.org Artificial Intelligence

The COVID-19 Forecast Hub was founded in 2020 and serves as a "central repository of COVID-19 forecasts from over 50 independent research groups" [1]. Participant research groups submit county, state and national US COVID-19 forecasts with a standardized format; and the Forecast Hub provides an interactive visualization tool to help decision makers and the general public analyze weekly predictions for COVID-19 hospitalizations, cases and deaths. The standardized predictions collected from all research groups, as well as the predictions for an ensemble model that brings all individual predictions together, are also shared with the Centers for Disease Control and Prevention (CDC) who uses these results for their official COVID-19 communications [2]. The COVID-19 Forecast Hub has been, and continues to be, a critical centralized resource to promote transparent decision making. Nevertheless, by focusing exclusively on prediction accuracy at different spatial granularities (e.g., county or state), the Forecast Hub fails to evaluate whether the proposed models are fair i.e., share similar prediction performance across social determinants that have been known to play a role in COVID-19 including race, ethnicity and rurality [3, 4]. Diverse prediction performance across social determinants - for example, higher prediction errors for a given minority race or ethnicity - could negatively impact resource allocation and intervention decisions e.g., hospital beds or stay-at-home orders, given that the CDC appears to be using the Forecast Hub predictions for official communications that subsequently inform policy decisions [2]. In other words, allocation or intervention harms might occur if models from the Forecast Hub are used to inform decision making across communities without taking into account fairness metrics [5]. There are many reasons why the COVID-19 prediction performance can be different across social determinants such as race, ethnicity or urbanization levels. The Forecast Hub's COVID-19 prediction models are trained on datasets containing COVID-19